The Bias-Variance Tradeoff and the Randomized GACV
نویسندگان
چکیده
We propose a new in-sample cross validation based method (randomized GACV) for choosing smoothing or bandwidth parameters that govern the bias-variance or fit-complexity tradeoff in 'soft' classification. Soft classification refers to a learning procedure which estimates the probability that an example with a given attribute vector is in class 1 vs class O. The target for optimizing the the tradeoff is the Kullback-Liebler distance between the estimated probability distribution and the 'true' probability distribution, representing knowledge of an infinite population. The method uses a randomized estimate of the trace of a Hessian and mimics cross validation at the cost of a single relearning with perturbed outcome data.
منابع مشابه
Generalized Forced Quantitative Randomized Response Model
A new generalized forced quantitative randomized response (GFQRR) model for estimating the population total of a sensitive variable is proposed and studied under a unified setup. The bias and variance expressions are derived under unequal probability sampling design. It is shown that the models due to Eichhorn and Hayre (1983), Bar-Lev, Bobovitch, and Boukai (2004), Liu and Chow (1976a, 1976b),...
متن کاملBias-variance analysis in estimating true query model for information retrieval
The estimation of query model is an important task in language modeling (LM) approaches to information retrieval (IR). The ideal estimation is expected to be not only effective in terms of high mean retrieval performance over all queries, but also stable in terms of low variance of retrieval performance across different queries. In practice, however, improving effectiveness can sacrifice stabil...
متن کاملBias-Variance Techniques for Monte Carlo Optimization: Cross-validation for the CE Method
In this paper, we examine the CE method in the broad context of Monte Carlo Optimization (MCO) [Ermoliev and Norkin, 1998, Robert and Casella, 2004] and Parametric Learning (PL), a type of machine learning. A well-known overarching principle used to improve the performance of many PL algorithms is the bias-variance tradeoff [Wolpert, 1997]. This tradeoff has been used to improve PL algorithms r...
متن کاملOn Bias Plus Variance
This paper presents a Bayesian additive “correction” to the familiar quadratic loss biasplus-variance formula. It then discusses some other loss-function-specific aspects of supervised learning. It ends by presenting a version of the bias-plus-variance formula appropriate for log loss, and then the Bayesian additive correction to that formula. Both the quadratic loss and log loss correction ter...
متن کاملApproximate Smoothing Spline Methods for LargeData Sets in the Binary Case
We consider the use of smoothing splines in generalized additive models with binary responses in the large data set situation. Xiang and Wahba (1996) proposed using the Generalized Approximate Cross Validation (GACV) function as a method to choose (multiple) smoothing parameters in the binary data case and demonstrated through simulation that the GACV method compares well to existing iterative ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1998